Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for segments #60

Open
neochrome opened this issue Aug 31, 2019 · 29 comments
Open

support for segments #60

neochrome opened this issue Aug 31, 2019 · 29 comments

Comments

@neochrome
Copy link

I was trying to organize my code into multiple files, each with their own responsibility and run into some issues with the inclusion order into the main program. If sub1.asm defines some subroutines and loads some data, and then sub2.asm does that too, the final order of things would be something like sub1 code, sub1 data, sub2 code, sub2 data, which may not always be desired. Instead one might want to be able to consolidate and keep code together and then data together.
Would something like this be worthwhile to implement?

Maybe it could work a bit like KickAssembler .segment/.segmentdef but simpler?

@neochrome
Copy link
Author

Example:

!segmentdef ZP {from: $02, to: $ff}
!segmentdef CODE {from: $0800, to: $0fff}
! segmentdef DATA {from $1000, to: $1fff}
...
; sub1.asm
!segment CODE
lda #0

!segment DATA
music: !byte sid_data

@nurpax
Copy link
Owner

nurpax commented Aug 31, 2019

Totally in 💪 of this feature. Been on my list for some time.

The zp part is easy to do with just variables right now but all the others make a lot of sense.

I like the object literal syntax for segment defs. :) Perhaps it could use the same keyword for both declaration and use. !segment with no from/to args could just start a new segment at the current pc.

@neochrome
Copy link
Author

neochrome commented Sep 1, 2019

Totally in 💪 of this feature. Been on my list for some time.

Cool! 😎

I like the object literal syntax for segment defs. :) Perhaps it could use the same keyword for both declaration and use. !segment with no from/to args could just start a new segment at the current pc.

Do you mean something like this?

!segment CODE { from: $0800, to: $0fff }
!segment DATA { from: $1000, to: $1fff }

; somewhere else, maybe in another .asm file
!segment CODE
a_sub_routine:
  lda some_data
  ...
  rts

!segment DATA
some_data: !byte 0,1,2,3

I think it looks intuitive enough, using the same keyword both for declaration and use.

Another possibility could perhaps be to allow specifying a previously defined segment when declaring a scope for a label? Like this:

!segment CODE { from: $0800, to: $0fff }
!segment DATA { from: $1000, to: $1fff }

; somewhere else, maybe in another .asm file
a_sub_routine: CODE {
  lda some_data
  ...
  rts
}

some_data: DATA  {
  !byte 0,1,2,3
}

However I don't know how that would work with filescopes...and also it might not be as clear as using a specific keyword?

@nurpax
Copy link
Owner

nurpax commented Sep 2, 2019

@neochrome I guess in your idea of segments, the segments would always output whatever they contain into the output PRG?

That doesn't seem to be the case in KickAssembler. Only the default segment goes into the output PRG by default. Any other segment needs to be explicitly written to an output file (or merged into the default segment).

It's not a completely bad idea to support more general segments like in KA. This enables cartridge builds, multi-part demo builds, etc. Just need to think about implementation carefully.

BTW agree that a specific keyword is better than somehow adding special label syntax for segments.

@neochrome
Copy link
Author

@nurpax I believe your right about KA's default behavior and also, my thoughts was to have a slightly different default, where the output ends up in the same file.
I think that is a more sane approach, and one that could be expanded upon to allow output configurations, like in KA if wanted, by adding more configuration options.

@nurpax
Copy link
Owner

nurpax commented Sep 30, 2019

@neochrome BTW, I didn't give up on this. I've just been on a bit of a coding break, playing Zelda: Link's awakening and learning Rust lang. Will certainly work on this at some point.

The easiest implementation would be where all segments are declared with a fixed starting address (and maybe size). But KA supports segments where you can say segment B starts after segment A. It's of course possible to support this, but it will automatically mean more compilation passes to work out the starting address for those "start from" segments.

@neochrome
Copy link
Author

No worries at all! I've kept busy toying around with a dsl-like solution in ruby for constructing 6502 machine language just to try out some stuff :)

The easiest implementation would be where all segments are declared with a fixed starting address (and maybe size).

I think the easier route is the way to go on this - both to get something going, but also because it's probably not that hard to choose segment range up front anyway...

@shazz
Copy link
Contributor

shazz commented Jan 7, 2021

I wake up this issue as it would be a great improvement as complex application requiring detailed and fine memory organization is hard to design without segments.

@nurpax
Copy link
Owner

nurpax commented Jan 29, 2021

@neochrome BTW, I didn't give up on this. I've just been on a bit of a coding break, playing Zelda: Link's awakening and learning Rust lang. Will certainly work on this at some point.

Said the author in September 2019. :) But I'm starting to pull this into my cache again, hopefully with better results this time.

@nurpax
Copy link
Owner

nurpax commented Jan 30, 2021

I actually went ahead and implemented segment support. If there's anyone still around in this GitHub issue that cares, I could post my design up here for a quick review..

Currently it looks like below. The syntax is kind of arbitrary, just what felt ok when I started looking at the parser. start/end arguments can take on any expressions, so you could even load the values from a JSON file if you wanted to somehow externally configure memory layout.

But most likely the expressions used for start/end must be values that do not depend on label values. Because I think that will probably make the multi-pass forward reference label address resolver never converge. Or so it feels like, didn't think it through.

!segment code(start=$810, end=$820)
!segment data(start=$830, end=$840)

* = $801
    lda #0

!segment code ; use code segment
    lda #1    ; should be at address $810
    lda #2

!segment data
!byte 0,1,2,3 ; data, should go into address $830

!segment code ; emit to code segment
    lda #3
    lda #4 

This yields the following disassembly:

0801: A9 00        LDA #$00
0803: 00           BRK
0804: 00           BRK
0805: 00           BRK
0806: 00           BRK
0807: 00           BRK
0808: 00           BRK
0809: 00           BRK
080A: 00           BRK
080B: 00           BRK
080C: 00           BRK
080D: 00           BRK
080E: 00           BRK
080F: 00           BRK
0810: A9 01        LDA #$01
0812: A9 02        LDA #$02
0814: A9 03        LDA #$03
0816: A9 04        LDA #$04
0818: 00           BRK
0819: 00           BRK
081A: 00           BRK
081B: 00           BRK
081C: 00           BRK
081D: 00           BRK
081E: 00           BRK
081F: 00           BRK
0820: 00           BRK
0821: 00           BRK
0822: 00           BRK
0823: 00           BRK
0824: 00           BRK
0825: 00           BRK
0826: 00           BRK
0827: 00           BRK
0828: 00           BRK
0829: 00           BRK
082A: 00           BRK
082B: 00           BRK
082C: 00           BRK
082D: 00           BRK
082E: 00           BRK
082F: 00           BRK
0830: 00           BRK
0831: 01 02        ORA ($02,X)
0833: 03

@neochrome
Copy link
Author

neochrome commented Jan 30, 2021 via email

@shazz
Copy link
Contributor

shazz commented Jan 30, 2021

I like the fact you can split a segment and continue to append code to it :D Even if I have no clue what could be the usage.

In terms of syntax and behavior, I'm pretty pleased with the proposal.
To possibly complete it, here is my practical use-cases, I'll try to explain them without taking too many technical assumptions.

Usecases / Goals

Goal 1: avoid the Tetris game (or should I say Bintris ???) when coding a one-file demo which trying to maximize memory usage while respecting C64 memory constraints (banks, alignment, screen mem, i/o,...)

Goal 2: simplify the design of a multi-parts demo using OMG's Sparkle

Usecase 1 details

Let's start with the most obvious one (Goal 1).

  1. The story is simple, I'm starting to code a little demo effect without taking care too much of where I set the code, the sprites, the music...
  2. Then I add another effect I already coded... and ah it is not that easy, I set some stuff at the same memory location (typically gfx data) or they overlap or they don't all fit in the same d018 setup.
  3. And this is the beginning of the Tetris game, I move parts of the code, set arbitrary * = SPRITE_DATA, change SPRITE_DATA to make it fit... and c64jasm starts to complain that one *= cannot be before another and so on.
  4. So I need to cut/paste the code up and down and start again.

Using segments, I'm quite sure the Tetris game will be easier, at least, no need to physically move part of the listing up and down.

Usecase 2 details

Now, the second goal which may also have an impact on the PRG generation.

In brief, Sparkle is an IRQ Loader which takes care of building a disk image and depacking/loading into data in memory (code or other) when requested. Sparkle doesn't require to change the code or adapt it, is a a second pass process.

Using the Sparkle (Windows only :( GUI) or manually), you only have to define a script which will design your multi-part application. here is a simple example I built recently based on 2 demo effects I built with c64jasm

[Sparkle Loader Script]

Path:	trsi
Header:	CSDB Compo
ID:	trsi
Name:	SpritesOnly
Start:	1e00
DirArt:	dirart.txt
IL0:	05
IL1:	03
IL2:	03
IL3:	03
ZP:	10
Loop:	0

Script:	sequencer\sequencer.sls

File:	bigsprite\data\skull5.bin	2800
File:	bigsprite\bin\bigsprite.prg	0834	0035	076c
File:	bigsprite\bin\bigsprite.prg	4000	3801	014f
File:	bigsprite\bin\bigsprite.prg	7000	6801	007f

File:	multiplexer\data\cubes.bin	2000
File:	multiplexer\bin\multiplexer.prg	0951	0152	013d
File:	multiplexer\bin\multiplexer.prg	c500	bd01	07d7

Without going into all the details of a sls script, let's have a look at how the 2 demo parts are defined .

Part I: BigSprite

this part consists of:

  • sprite gfx data (skull5.bin) which has to be loaded at $2800 (screen mem)
  • the init code (bigsprite.prg) which has to be loaded at $4000. (jmp to there is managed by the Sparkle sequencer)
  • the irq code (bigsprite.prg) which has to be loaded at $0834
  • the tables generation code (bigsprite.prg) which has to be loaded at $7000

Part 2: Multiplexer

this part consists of:

  • sprite gfx data (cubes.bin) which has to be loaded at $2000 (screen mem)
  • the init code (multiplexer.prg) which has to be loaded at $0951. (jmp to there is managed by the Sparkle sequencer)
  • the irq code (multiplexer.prg) which has to be loaded at $C500

Fortunately, Sparkle can manage offsets when the cross-assembler don't manage segment assembling and will extract from any file (in my case the prg which aggregates the various segments) the required slice. This is the meaning of the 2 additional parameters:

File:	multiplexer\bin\multiplexer.prg	0951	0152	013d

=> From multiplexer.prg, extracts data starting at offset 0x152 for 0x13d bytes and loads it at $0951

So using a python script and the labels file generated from c64jasm, I could automatically generate those offset/size parameters for each segment. That works but that would be soooo better if c64jasm could generate one prg (as Sparkle can use the 2 first bytes to get the start address) or any binary file for each segment.

I hope my 2 use-cases make sense, fell free t comment if not clear or anything.

Comments

Last point, not (yet) a requirement but something I found interesting while trying CC65 relocatable segments linker is that it gave me the possibility to split my code in different files (and avoid the 10km code in one file) and using the .export/.import directives to defines global labels.
Then the linker automatically resolves the labels.

@shazz
Copy link
Contributor

shazz commented Jan 30, 2021

!segment code(start=$810, end=$820)
!segment data(start=$830, end=$840)

As the end looks mandatory (I was wondering if it makes sense of not. Not totally sure yet but I start to think this is better to fix it even if I don't always know yet how big my code will be in this segment), will the assembler tell for each segment how much is used / left ?

* = $801
lda #0

Should the entry point be defined as an org or cold it be a parameter or !segment ?

@nurpax
Copy link
Owner

nurpax commented Jan 30, 2021

end is currently mandatory but it's pretty easy to work out during assembly too. It might work to make it optional and just check that the segment doesn't grow over some other segments.

The start of the segment is tricker to figure out. For example something like this:

!segment code () ; neither start or end specified
!segment data () ; data implicitly starts after code

Figuring out the start of data may lead to some sort of multipass explosion. I'm not entirely sure :)

@nurpax
Copy link
Owner

nurpax commented Jan 31, 2021

@shazz how do you feel about the keyword arg syntax?

I realize that it is a bit inconsistent with older features like !binary:

!binary "file1.bin"       ; all of file1.bin
!binary "file2.bin",256   ; first 256 bytes of file
!binary "file2.bin",256,8 ; 256 bytes from offset 8

I've been coding so much Python lately that this

!segment code(start=$810, end=$820)
!segment data(start=$830, end=$840)

felt like the obvious first choice. But @neochrome above used JS object notation. In some sense that's more in the spirit of c64jasm and JS..

Feels like this keyword syntax might in fact be something that could be retrofitted for things like !binary, like:

!binary ("file", length=256, offset=4)

@shazz another q: re multiple output files. So something like this:

!segment code(start=$810, end=$1000)
!segment code_b(output="b.bin", start=$810)

With !segment, output would default to whats specified on the command line with --out file.

I think overlapping segment address ranges would be forbidden. But probably only if the overlap happens within a single output?

@shazz
Copy link
Contributor

shazz commented Jan 31, 2021

About !binary

I really prefer the keyword syntax, I always thought you kept the !binary "file2.bin",256,8 syntax to look like old cross-assemblers.
I don't really like it, I never remember what 8 and 256 means, in which order. So if you have the time and motivation for a spring cleaning, that would make c64jasm more consistent (and let's forget the old bad habits)

Segments

About the segment porposal, I like the optional output param to generate the segment prg/bin.

Dict vs Function notation

About dict notation vs function notation... I would say I prefer the function notation when it is an action (like !binary) and the dict notation when it is a configuration. So for the segments.... if you can get rid of the top definition and just do:

!segment code(start=$810, end=$820)
    lda #1    ; should be at address $810
    lda #2

!segment data(start=$830, end=$840)
!byte 0,1,2,3 ; data, should go into address $830

It looks fine to me but the segment split/append becomes more complicated (even if I don't think I will use it).
If you prefer the top segment definition, yeah, maybe dict notation looks more natural to me.

In my python meta-cross-assembler, here how I define a segment, using context manager.

with segment(0x0801, "CODE") as s:
   ...

But really, dict or function, both are ok for me. Your call.

@nurpax
Copy link
Owner

nurpax commented Jan 31, 2021

I like the function syntax better too.

Re def vs. use. Kickassembler made a distinction between declaring a segment and using it. I also definitely see use for being able to be alternating between segments (like my example code -> data -> code), even though I guess you didn't quite see the point. Neochrome's first comment indicates that he was specifically looking for this type of alternating segment support, and I've had a need for this myself.

I thought the !segment (...) would always be a declaration, and later !segment <name> would be a use. KickAssembler has segmentdef and segment for these but I'd prefer just a single keyword.

Alternatively it could be that the first occurrence of something like

!segment code(start=$810, end=$820)

both defines a new segment, and marks it active. So it'd be equivalent to:

!segment code(start=$810, end=$820)
!segment code

Maybe? OTOH, this will have the problem that if someone does this:

!segment code(start=$810, end=$820)
!segment data(start=$1000)

; ok time to code
main: lda #0

their code would go into the data segment..

@shazz
Copy link
Contributor

shazz commented Feb 1, 2021

I think is first idea is fine, !segment (...) to define, segment <name> to use. Less confusion.

@shazz
Copy link
Contributor

shazz commented Feb 1, 2021

Question raised under the shower this morning. Segments are also useful to split huge codebase (particularly unmanageable in assembly) into small chunks. Will it be possible to include the segments definition in each segment file ?

@shazz
Copy link
Contributor

shazz commented Feb 2, 2021

If the segments are split in multiple files, how the cli will look like ?

Something like that?

c64jasm --disasm-file hello_world.lst --labels-file hello_world.labels --out hello_world.prg START.asm CHARSET.asm

@nurpax
Copy link
Owner

nurpax commented Feb 2, 2021

Good question. I actually always thought the segments would still be included through some common .asm file. But if multiple files on the command line feels good (I haven't decided), this could be treated as the same as including all of the listed files in a single (implicit) asm file.

This brings out another questions which is: should there be some sort of --outdir option too? If compilation can output multiple .prg and .bin files, it'd be nice if they wouldn't get saved under the source dir.

@nurpax
Copy link
Owner

nurpax commented Feb 6, 2021

Sorry @neochrome and @shazz, the below update is a bit long. I tried to summarize the current design in case you have some suggestions on how to improve it. It feels pretty good to me so far.

I checked in some work on segments.. ea95e02

It still has some bugs and missing features:

  • outputs only the --out out.prg, this will be extended (see notes below)
  • error checking: not validating that segments don't overlap
  • segment start/end values must be computable in the first compilation pass (see example below)
  • scoping rules for segment declaration/use?

Allowable parameters for the segment start argument

The start argument expression value must be constrained to only accept values that can be resolved in the compilation pass. I think there are cases otherwise that will cause the multipass compiler to never converge. Something like this:

!segment code(start=foo)

  jmp foo
!segment code
  lda #1

foo:
  lda #0

I can't quite wrap my head around what should even happen in this case.

But the start/end expressions would be otherwise just normal expressions. So you could even read their values from say a JSON file. Something like this would be allowed:

  lda #0
foo:
  lda #1

!segment code(start=foo+2)
!segment code
  lda #3

This would generate:

  lda #0
  lda #1
  lda #2

A start=foo would generate an error as it'd cause code to overlap with the default segment.

Default segment and default output

By default everything goes into an implicit "default" segment that gets saved into the output prg specified on the command line with --out. (Very much like in KickAssembler).

Segments without any output declarations will be saved in the same default output too. (Unlike in KickAssembler that just throws away segments that do not specify an output.)

E.,g

!segment code(start=$1000)
  * = $801
  lda #0
  sta $d020

!segment code
  lda #0  ; this will go to address $1000

The output will contain binary from $801 (start of default segment) up to $1002 (end of code segment).

Multiple outputs

The plan is to add an argument like out in !segment code(out="foo.prg", start=$1000) that means anything going into that segment will not be saved to the "default" output specified by --out but into foo.prg.

CLI must be extended with --outdir flag so that the above foo.prg can be written to some build dir instead of the current directory.

However, I'm not a big fan of sticking filenames in source files. So I'd at least like a command line override that you could use to
say c64jasm --segment code.out="bar.prg" to override. This makes build scripting more flexible.

Scoping rules

Scoping for segments. Right now they follow the same scoping rules as everything else, including relative scope references
and nested scope names. But does this make much sense? E.g.,

!if (foo) {
!segment code(start=$1000) ; in anonymous scope
} else {
!segment code(start=$2000)
}
!segment code ; NOPE, code was declared in an anonymous scope above, so this fails

Of course you could write the above like this:

!let s = $1000
!if (!foo) {
!! s = $2000
}
!segment code(start=s)

Similarly you could now do something like this:

file_x.asm:

!filescope foo
!segment code(start=$1000)

main.asm:

!include "file_x.asm"

!segment foo::code  ; switch to segment defined in file_x.asm

Not sure if this is useful or even desired. Maybe it is?

@neochrome
Copy link
Author

Sorry for the late feedback - been quite busy with work etc.
First, I like the proposed syntax, very clear how to define the start/end of segments and how to switch which is active. I guess some kind of error could be had if one puts more than what fits in a segment?

I think it would be good enough (at least in a first version) to have to specify separate segment output(s) on the CLI if need be. Not sure how useful it would be to be able to specify overlapping segments (as long as they go to different outputs) - might be allowed by KA, but again not sure of a good use case.
Another way of catering for that might be to allow the same segment to be outputted to multiple different files by specifying multiple outputs on the CLI and have them en up in address order on file (of course).

With regards to scoping, I think keeping it simple and predictable (for the user) is important.
For me, IIRC, the main purpose was to be able to define segments in the main file, and referring to them from other files in order to put stuff in place in an organized way :)

I played around a bit with segment defs in my little Ruby DSL for generating 6502 binary code - I'll have a look at how I tackled some of these cases there and see if I find something more.

nurpax added a commit that referenced this issue Feb 6, 2021
…es (#60)

Machinery to accept only expression values in segment start/end that
are possible to compute in the first pass of compilation.

This should handle most cases, although I guess conditional compilation
could still leak forward label values into segment arguments.  Not sure
if there's a reasonable way to fix this.
@nurpax
Copy link
Owner

nurpax commented Feb 6, 2021

Thanks for the comments!

Not sure how useful it would be to be able to specify overlapping segments (as long as they go to different outputs) - might be allowed by KA, but again not sure of a good use case.

One use-case would be if someone wants to build multiple .prg files (say a multipart demo) with a single command line invocation. Might be handy if you just want to kickoff a c64jasm --watch src for the whole demo project.

With regards to scoping, I think keeping it simple and predictable (for the user) is important.

I think the current implementation is fine now. It follows the same scoping as variables and other symbols.

Apart from some minor error checking and multiple prg output, the feature is pretty much done.

@shazz
Copy link
Contributor

shazz commented Feb 6, 2021

I agree with @neochrome.

In details:

  • limit on start parameter makes sense to me. I don't have in mind a case when segment start address may be defined after the fact
  • default segment and output, makes sense too
  • multiple outputs: yes having the filename possibly set/overridden in the CLI looks better
  • scoping rules: ah.. interesting... conditional segments... I think the value of the start may be conditional but not the segment itself (and so in a nested scope), so workaround works well.

Comments:

  • if segment outputs are specifiied, it won't prevent to build the full PRG right ? At least to debug :)
  • segment overlapping check will be good to have, but if it happens what will be the result ? segment 1 will be overwritten by segment 2? Could be hard to figure out.
  • would be nice that as a result to the assembing a little report could be printed on segment: listing all with their start and end address + not used / 0 bytes into the segment if end_address is fixed

But overall looks perfect to me I don't need much more than @neochrome (define/split/organize) and segment outputs for Sparkle as I can tell right now :)

@neochrome, funny, I did the same but not in Ruby, in python :)

@shazz
Copy link
Contributor

shazz commented Feb 6, 2021

Btw, building the branch generates some warning:

added 367 packages, and audited 434 packages in 6s

14 vulnerabilities (12 low, 2 high)

Is it.. important ?

@nurpax
Copy link
Owner

nurpax commented Feb 6, 2021

Thanks again!

if segment outputs are specifiied, it won't prevent to build the full PRG right ? At least to debug :)

Can you expand on this? I don't understand.

segment overlapping check will be good to have, but if it happens what will be the result ?

Default behavior would be to treat this as an error if segments going to the same output file overlap. Overlapping segments with a different destination would be fine (otherwise you couldn't really build multiple prg outputs).

@nurpax
Copy link
Owner

nurpax commented Feb 6, 2021

Is it.. important ?

I've been conditioned to ignore these due to GitHub's dependabot spam that I've been getting for the past 1-2 years now. Should clean those up at some point.

@shazz
Copy link
Contributor

shazz commented Feb 6, 2021

for outputs I meant, the assembling process will:

  • output every segment binary in separate file (possibly a PRG to get the start address in the header) if requested (--out, ...) => used by the IRQ Loader to sequence part
  • the usual PRG with all segments linked in order and at location => used by me to debug my parts :)

nurpax added a commit that referenced this issue Feb 6, 2021
instead of the hard to remember !binary syntax of:

!binary "file1.bin"       ; all of file1.bin
!binary "file2.bin",256   ; first 256 bytes of file
!binary "file2.bin",256,8 ; 256 bytes from offset 8

..add keyword args support so that this is also legal:

!binary (file="binary1.bin", offset=0)
!binary (file="binary1.bin", offset=2, size=2)
!binary (file="binary1.bin", size=4)
!binary (file="binary1.bin", offset=2)

While at it, handle binary output truncation in case offset+size reaches
beyond the end of the file.
nurpax added a commit that referenced this issue Feb 7, 2021
Kind of work-in-progress -- this will not catch all
segment overlaps.  At least overlaps with the current
default segment are not checked.  Revisit when adding
separate prg outputs?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants