Jump to content
IGNORED

Complete noob trying futilely to work on a bug, please save me from myself


Khromak

Recommended Posts

Hopefully there are still some good NES programmers out there who can help me out with a little side project I decided to take on.

I could explain all the debugging and troubleshooting I did to get to this point, both so you can laugh at my expense and also so you could help me figure out what's ACTUALLY going on here, but I'm going to assume, for the purposes of my question, that everything I think I know about what's happening in the code is true.

That said...I'm really getting lost at this point in the code (I've included a screenshot from the Mesen debugger). Presumably the value from location 65B9 is supposed to go into 04A3, which makes a lot of sense because in the current game state the value is $75, which would make a lot of sense logically (trust me on that one). But somehow, the very next instruction, which is taking the value of the A register and putting it into 04A3, is returning a value of $B1. In the debugger in Mesen I can even see that the value of the A register is $75, and it continues to be $75 as I step through these lines of code.

My current guess is that there's some simultaneous processing happening at the same time which is also accessing the A register and screwing it up between these two instructions.

So uh. Is that happening? If so, what can be done about this? For what it's worth, sometimes this code works flawlessly dozens or maybe hundreds of times in a row, but then (seemingly) randomly it will get messed up.

Any advice is appreciated, if you want more details about the background I'm happy to provide any/all background that may help. It's not some great secret or anything, I just left those details out to avoid making this wall-o-text even worse.

MesenDebugger.png.e2876a964369cb1f429e8f1e4877428e.png

Link to comment
Share on other sites

7 hours ago, Khromak said:

My current guess is that there's some simultaneous processing happening at the same time which is also accessing the A register and screwing it up between these two instructions.

The NES doesn't do "simultaneous" processes, but what can happen between two instructions is an interrupt happening (most likely the NMI routine), which is why you typically want to push the A, X, and Y registers to the stack at the beginning of an interrupt, and pull them back in reverse order before returning from it. (don't worry about the flags register, the CPU takes care of that).

Usually an interrupt will start with code like this:
 

	pha
	txa
	pha
	tya
	pha

and end like this:
 

	pla
	tay
	pla
	tax
	pla   
rti

 

First of all though, I'd try to make sure that's what actually happening though. Since you say you can see the value of A actually being set to $75, I'm assuming you are stepping through the code, and seeing what happens.

I'm not sure how you went about that, but what you want to do is put your breakpoint at the "LDA" line in the Mesen debugger, and then "step into" (F11 key) over each line and see where it takes you. If any interrupt happens between the two lines, the debugger should take you there, and make clear what happens.

If it just takes you to the next "STA" line, that's not your issue. "Step into" one more line to see the memory values change. It's worth remembering that Mesen splits opcodes into individual steps (like the CPU does) and emulates each part of them individually, so the target memory address ($04A3, assuming X is $05) doesn't get updated the moment it hits that line, but only right before it starts processing the next one.
If you have the debugger stopped right after line $EE62 completed, and nothing else happened inbetween those two lines, the value definitely should be $75 in both addresses.

Link to comment
Share on other sites

Thanks for the explanation @Sumez, you are right about a lot of things there. What happened and why I have the screenshot above is I hadn't advanced far enough. The LDA wasn't finished yet. As it turns out, the problem is actually elsewhere in the code. I still don't have an understanding of how to fix it because it goes deep into the code and involves memory locations which are used for a whole bunch of other code and change dynamically.

The bug I'm trying to resolve happens when the "seed" values of A going into this are incorrect, because those memory addresses are being affected by some other code at the time.

This is probably just way above my paygrade and not something I can realistically figure out.

Here's what happens before the affected code, and this is an example where it's messed up. EE4E and EE53 are supposed to "set up" a location to pull the value I'm concerned with from and in this case the underlying conditions are causing it to be off.

I'm working with a hack of TLOZ and this will naturally happen if I just leave the game long enough, but I've noticed that these values can also get screwed up by Link being hit or moving Link in certain ways.

I think I have one other scenario I can use as an edge case where the game doesn't freeze so I can try to find "normal" behavior, but at some point fairly soon I'm going to give up as I'm in over my depths 😄

Codesnippet.png.d5add9069692f22a688d87ba8a2703ee.png

Link to comment
Share on other sites

Ah yeah, messing with other people's assembled code is definitely a lot more confusing and error prone than making your own.

I'm not sure exactly what the problem you're describing is, but it sounds like you might be able to get wiser on it if you familiarize yourself better with Mesen's debugger and learn some of the tools it provides. For example, you can open the memory viewer to see what happens with all the memory on the console as it happens, and set breakpoints for whenever any code reads or writes to/from an address, and you can even set up conditions such as only break when the value being written falls within a certain range, etc.

Link to comment
Share on other sites

Yeah I've got a bunch of breakpoints like that set up and I've been able to isolate what's happening, more-or-less.

There's a room with Lanmola in it and sometimes when they run into a corner they get stuck. Their "direction" is set at a specific address and they randomly change direction from 75, 76, 77. Sometimes the direction gets set to junk like B0, B1, B2. When they're in an open room it just resolves itself and they just turn out, but when they're stuck in a corner it will get stuck rotating through B1/B2 and lock up.

I suspect it has something to do with logic saying if it's not in that list then just go sideways, to correct for when they run into walls, but there is a ton of code which executes between those and nothing sets or references those directions.

Probably going to put it on the back burner for now, it was actually a fun exercise for a little bit though. I appreciate your help!

Link to comment
Share on other sites

I have faith you can do it. Just keep at it and you'll slowly learn more. It took me forever to reverse engineer a simple system when I started, but now that I've done it several times I've gotten better at using the tools more proficiently. 

One of the best tools I've discovered is the trace logger. Just click start logging and it records every instruction that is executed. This is useful for see what the code is doing before it hits your breakpoint. Although maybe it isn't necessary if you've already found the place were the bug is happening.

Another useful thing is conditional breakpoints. You can set breakpoints to only trigger under certain conditions (for example if A > 5), this can help if you want to catch when A is an "incorrect" value. EDIT: lol didn't read that Sumez already suggested this.

Edited by 0xDEAFC0DE
Link to comment
Share on other sites

7 minutes ago, Khromak said:

Yeah I've got a bunch of breakpoints like that set up and I've been able to isolate what's happening, more-or-less.

There's a room with Lanmola in it and sometimes when they run into a corner they get stuck. Their "direction" is set at a specific address and they randomly change direction from 75, 76, 77. Sometimes the direction gets set to junk like B0, B1, B2. When they're in an open room it just resolves itself and they just turn out, but when they're stuck in a corner it will get stuck rotating through B1/B2 and lock up.

I suspect it has something to do with logic saying if it's not in that list then just go sideways, to correct for when they run into walls, but there is a ton of code which executes between those and nothing sets or references those directions.

Probably going to put it on the back burner for now, it was actually a fun exercise for a little bit though. I appreciate your help!

So what I would do in this scenario is set a conditional break for when the direction is set to an incorrect value, then turn on the trace logger, and finally try to reproduce the bug. When the breakpoint triggers, look through the trace logger and try to determine where this incorrect value is coming from. After you start to get an idea of how it works, you'll also want to compare that to a working execution of the code.

Link to comment
Share on other sites

7 minutes ago, 0xDEAFC0DE said:

So what I would do in this scenario is set a conditional break for when the direction is set to an incorrect value, then turn on the trace logger, and finally try to reproduce the bug. When the breakpoint triggers, look through the trace logger and try to determine where this incorrect value is coming from. After you start to get an idea of how it works, you'll also want to compare that to a working execution of the code.

I've got save states set up where the bug happens and also where it doesn't (in the open room with Lanmolas) and I've tried stepping through them, but there's hundreds of lines of code happening between calls to what I'll call the "direction setting" code and at my level of understanding I can't quickly follow what they're doing, especially not knowing the significance of the memory addresses they're updating. I've tried trace logging a little bit as well, having seen some explanations of it on YouTube videos about debugging with Mesen/FCEUX, but the trace logs I get, even for ~1 frame of execution, are thousands and thousands of lines long, so it seems pretty insurmountable to notice the differences.

Any suggestions are welcome! I'd love to be able to find the offending code and potentially solve it. I know I'm in the ballpark, but I'm definitely struggling with following the game logic.

Link to comment
Share on other sites

14 minutes ago, Khromak said:

I've got save states set up where the bug happens and also where it doesn't (in the open room with Lanmolas) and I've tried stepping through them, but there's hundreds of lines of code happening between calls to what I'll call the "direction setting" code and at my level of understanding I can't quickly follow what they're doing, especially not knowing the significance of the memory addresses they're updating. I've tried trace logging a little bit as well, having seen some explanations of it on YouTube videos about debugging with Mesen/FCEUX, but the trace logs I get, even for ~1 frame of execution, are thousands and thousands of lines long, so it seems pretty insurmountable to notice the differences.

Any suggestions are welcome! I'd love to be able to find the offending code and potentially solve it. I know I'm in the ballpark, but I'm definitely struggling with following the game logic.

Yeah, trace logging is only useful for going back tens of lines of code. You definitely don't want to follow it from the beginning of the frame. I'll give you an example from some reversing work I did with Blue Marlin for the NES weekly competition. (I use FCEUX because that's what I'm comfortable with, but most things should apply to Mesen as well). This probably isn't the best example as it's pretty complex, so feel free to skim the code parts. The more important thing is the general steps.

One of the things I wanted to figure out was how the type of fish was chosen when you hook one. After some experimenting with the RAM search (which I've found to be much better in FCEUX, but I digress) I found that the type is stored at $05B0. So, like I said, I set a write breakpoint, turn on the trace logger, and catch a fish.

The first section of code I see looks like this:

A:05 X:00 Y:08 S:F9 P:nvUbdizc       $B7B2:A0 00     LDY #$00
A:05 X:00 Y:00 S:F9 P:nvUbdiZc       $B7B4:A5 6D     LDA $006D = #$05
A:05 X:00 Y:00 S:F9 P:nvUbdizc       $B7B6:D9 CC 05  CMP $05CC,Y @ $05CC = #$00
A:05 X:00 Y:00 S:F9 P:nvUbdizC       $B7B9:90 03     BCC $B7BE
A:05 X:00 Y:00 S:F9 P:nvUbdizC       $B7BB:C8        INY
A:05 X:00 Y:01 S:F9 P:nvUbdizC       $B7BC:D0 F8     BNE $B7B6
... (several more loops here)
A:05 X:00 Y:07 S:F9 P:nvUbdizC       $B7B6:D9 CC 05  CMP $05CC,Y @ $05D3 = #$0F
A:05 X:00 Y:07 S:F9 P:NvUbdizc       $B7B9:90 03     BCC $B7BE
Breakpoint 0 Hit at $B7BE: $05B0:EC-W--
A:05 X:00 Y:07 S:F9 P:NvUbdizc       $B7BE:8C B0 05  STY $05B0 = #$06

I tend to read the trace logger from bottom to top. The last instruction stores Y in the fish size location. So, how was Y set? Scanning over the code we see that Y is initialized to zero then it is increased in a loop from $B7B6 to $B7BC. The loop exits when $05CC,Y is greater than the value at $006D. Now it might take you a little while to get there from reading the code, so what I like to do is convert the assembly to pseudo code so it is easier to understand. Something like this

Y = 0
do {
    if($05CC,Y > $006D)
        break
    Y++
} while(Y != 0)
// if you couldn't already tell, I'm a C++ guy by heart

This is a good first step, but now the question is how are $006D and the array at $05CC set? Well from here we have to look a little bit farther up in the trace log. I'm going to skip going through the next section of code since it's a bit long, but after a lot of conversion to pseudo code I realized that $006D is set from the random number generator. Going further up the Trace log we see code that sets the array at $05CC. Since the trace log follows the branches, sometimes it's better to look at the full code in the debugger. So here's the full relevant code section (again skim through it if necessary).

 00:B770:A0 08     LDY #$08
 00:B772:B9 C2 B7  LDA $B7C2,Y @ $B7C9 = #$02
 00:B775:2D 95 05  AND $0595 = #$02
 00:B778:F0 18     BEQ $B792
 00:B77A:B1 10     LDA ($10),Y @ $B82B = #$0A
 00:B77C:F0 14     BEQ $B792
 00:B77E:B1 12     LDA ($12),Y @ $B846 = #$00
 00:B780:30 09     BMI $B78B
 00:B782:18        CLC
 00:B783:71 10     ADC ($10),Y @ $B82B = #$0A
 00:B785:90 0B     BCC $B792
 00:B787:A9 FF     LDA #$FF
 00:B789:D0 07     BNE $B792
 00:B78B:18        CLC
 00:B78C:71 10     ADC ($10),Y @ $B82B = #$0A
 00:B78E:B0 02     BCS $B792
 00:B790:A9 00     LDA #$00
 00:B792:99 CC 05  STA $05CC,Y @ $05D3 = #$0F
 00:B795:88        DEY
 00:B796:10 DA     BPL $B772
// converted to pseudo code (this takes awhile to do, so don't get discouraged if you don't understand how I got to here)
for(Y = 8; Y != 0; Y--){
    // after some watching of $0595 I determine this is checking if the fish type matches the method of catch
    // (don't worry about this if unfamiliar with the game)
    A = $B7C2,Y & $0595
    if(A == 0) goto $B792
    
    A = ($10),Y
    if(A == 0) goto $B792
    
    A = ($12),Y
    A += ($10),Y // clamped between 0 and #$FF if an overflow/underflow happens
    
$B792:
    $05CC,Y = A
}

From the pseudo code, I realized that this code can further be boiled down to "$05CC,Y = ($10),Y + ($12),Y" (the $0595 stuff can be ignored for simplicity). Looking into the values of $10 and $12 I see that they are pointers into ROM and looking at those addresses I see some look up tables, and there are other similar looking data around where $10 and $12 point to while debugging. So, I concluded that the chances for each fish type are stored in ROM with table looked up by $10 being the base change and $12 being some modifier. Going further up the trace log I found that $10 and $12 are derived from $00A2 and $05AC respectively. So, when $00A2 changes a different set of base chances are used.

Now at this point I didn't see anything in the next couple lines above that deals with $00A2 and $05AC, so I added those to my RAM watch to see if I could get them to change. Eventually I discovered that $00A2 changes with the fishing location and it makes since that the fishing location would influence the fish type spawning rates. However, I was never able to find were $05AC changes. Even adding a write breakpoint didn't pick anything up. So, with that one I concluded that the developers never finished implementing that part.

Hopefully I didn't lose you there as I'm not the best explainer, but here's a general summary of the techniques I used. Once you find where an important address is being modified, look higher up in the Trace log to see where it comes from. Often you might need to convert the assembly to pseudo code to better understand it. When you see unknown address you have two major options: look further up in the trace log to see if it was recently set, or watch it in RAM watch to try and figure out what it does/represents (or you can also try modifying it to see if you notice anything different).

Link to comment
Share on other sites

Quick update, I've read a TON of code, lots of guides and references on the interwebs, and I've managed to come up with a Game Genie code which fixes this bug and doesn't seem to affect the behavior of the Lanmola noticeably (though I'm sure it affects them a little bit) and doesn't affect any other part of the gameplay that I've been able to find by wandering around the dungeon/world map a bit with some breakpoints on.

I'd still like to try to find a fix which I can apply to the ROM to fix it more permanently, but this is still a big milestone, so I'm excited about that.

Hopefully another update coming in the future that I've found a more permanent fix. I've got more code to read!

  • Like 2
Link to comment
Share on other sites

OK so I figured out a way to do it within the code and was working on making a long list of Game Genie codes to make the fixes, but the codes keep getting screwed up. Any advice on using Game Genie for this? I was going to put this code into an empty place in the ROM (B470 was filled with all FF so I figured it was unused) :

CMP #$07    C9 07
BNE $B477    D0 03
LDA #$0B    A9 0B
STA $050F    8D 0F 05
JMP $AAB8    4C B8 AA

And then I had to update AAB5 to use this code: JMP $B470 (4C 70 B4).

Here are the codes I was trying to use, in order (first AAB5 changes then the others). I think the problem is something to do with bank switching, because when I look at AAB5 it's not always the same code. Sometimes it's the code I'm trying to modify, and other times it's some other nonsense.

GKLZSX
ANLZVZ
KULZNZ
OGYLAK
YAYLPG
EIYLZG
LAYLLG
OZYLGK
LAYLIK
SAYLTK
YAYLYK
IAYUAG
GGYUPK
ELYUZK
XZYULK

Edited by Khromak
Link to comment
Share on other sites

3 minutes ago, 0xDEAFC0DE said:

Is there a reason you are using Game Genie codes instead of making an IPS patch?

Uh. How to make an IPS patch?

Oh, also I found the problem with my codes above. The BNE was wrong because the instructions in Mesen weren't the same length as the ones I found online. Code 7 (LAYLLG) should be ZAYLLG and it seems to be working now.

Edited by Khromak
Link to comment
Share on other sites

Just now, Khromak said:

Uh. How to make an IPS patch?

An IPS patch basically just describes the difference between two binary files. Which makes if perfect for small fixes like this where you don't want to distribute the full ROM. To create one you take the original unmodified ROM along with your patched ROM, feed it to an IPS patcher, and it will generate an IPS patch. I have used Lunar IPS which worked fine. Then, whoever wants to use the IPS patch, feeds the IPS patcher the IPS file along with the original ROM, and it will generate the patched ROM.

  • Thanks 1
Link to comment
Share on other sites

Oh yeah, for hacks like this always go IPS. Don't use Game Genie codes unless you can do the fix in three codes or less, since that's the most an actual Game Genie allows on hardware. That's cool for people who have one, but if you're targeting emulators anyway, IPS patches is the way to do 😃

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...