• Solver School
  • Posts
  • Database Construction: Exploring the Remainder of Our Solver Data

Database Construction: Exploring the Remainder of Our Solver Data

This post was originally published on February 13, 2020, on my personal website, Lukich.io. I have since consolidated all of my poker-related content by reposting it onto Solver School.

Today, I plan to jump back into my overall research project. For the past month, my PC has consistently run solves to build up a database of flop scenarios. Last week, I took a mini-break to evaluate my progression. While I feel good about the overall direction, I realized I have done poorly documenting data inputs.

It can be very easy to become disorganized when organizing databases for a larger effort such as this one. Poor documentation can lead to major problems in any software or technology development. Given that all of my flop research will be derived from this input data set, understanding it is of paramount importance when analyzing any outputs or findings.

In starting the project, I selected the 32 formations I wanted to explore (I started with 33 but realized in this effort that I had duplicated one). I have kicked off scripts to solve the 184 flop subset, one formation at a time. What I didn’t do was make sure that I was consistent and comprehensive in setting up the tree configurations.

My goal has been to clean this up over the last week. I organized all formations logically and audited all ranges and tree configurations. I found a few small input errors, which I could correct. I also found a couple of related ranges (e.g. BB 3-bet vs BTN call and BB call vs BTN open) with overlapping ranges. Fixing these inconsistencies will result in better, more comprehensive analyses. As such, it was a valuable exercise to prioritize.

Now that I have an organizational system, I’m finalizing the last of the unsolved formations. I don’t have many left. However, the ones that I do have take longer to complete. These are the higher SPR and wider range formations, such as a BTN open vs BB call. I expect that to be complete over the next week.

Solver Data Outputs

In the meantime, I wanted to revisit the remainder of the raw data for this analysis. In my prior post, where I explored success metrics, I displayed a graphic of a database table.

PioSolver output for a series of boards given a formation (e.g. 3-bet pot EP vs IP)

As I mentioned in that post, the image above is a 10-row sample from a table within the database I’m building. This is the output that PioSolver generates for each flop solve for every formation. In that first post, I began defining my success metrics — Equity, Expectation Value, and Equity Realization — as these are ultimately the values I will use to derive strategies. However, I didn’t explore the rest of the data within these tables. So today, in the spirit of documentation, I want to revisit my foundational data sets and define input values.

First, I want to define the columns in the data table above fully. While I plan to have 10 data tables, as detailed below, all will follow this general format with similar types of column headers. As a result, examining one will provide enough definition to extrapolate to all.

I defined column 4 (OOP EQ) through column 9 (IP EQR) in that prior post. Even though this consists of 6 rows, it only represents three unique metrics. The OOP and IP prefixes denote the players in the hand (out-of-position = OOP; in-position = IP).

While the other columns aren’t success measures, they all are critical components within the database. These columns will either round out our analyses by adding dimensionality layers or help structure the database for indexing purposes.

The formal definitions for these columns are below:

  • Formation - This represents the file name for each 184-flop formation. To simplify outputs, I use similar strategic actions for common formations (detailed in the section below). This lets me include multiple formations in the same data table, increasing the efficiency of any calculations.

  • Flop - This represents the individual flop that is solved. I’ll also build out a table in a future post that defines the characteristics of each of the 184 flops. This will greatly improve the dimensionality we can add to our analyses and drive more specific and actionable insights.

  • Global % - This represents the percentage of instances in this node of the game tree occurs. All values are 100% for the example above because these rows occur in the flop root node. However, as we move further into the game tree, the Global % value will decrease. For example, if we were IP facing a check/raise, the value would be much lower because different strategic actions could have been chosen to that point. The check/raiser could have led out on the flop. We could have chosen to check back. Or he could have check/called or check/folded. These decisions lead to a pathway where the player reaches a different game tree node.

  • Bet # / Check - I group these because these values depend on one another. They represent the frequencies of strategic actions for each board at equilibrium and will always equal 100% when added together. The value after the bet represents the % of the pot that the player chooses (the example above is to bet 50% pot). There may be multiple Bet # columns to represent multiple bet sizings.

  • Bet # EV / Check EV - These column values are also grouped and will align with the bullet above. They represent the EV for each of the strategic actions discussed. Our overall EV will be between these two values as it is a weighted average based on the frequencies.

Now that I have defined the data inputs, I will zoom out to examine the entire set of data tables that make up the database.

Database Organization

As I built the trees for the 32 formations, I chose strategic actions for each. More specifically, I had to select each player's bet size(s).

As described above, table columns — Bet #, Check, Bet # EV, Check EV — are directly related to the actions we choose. The report output will reflect the new inputs if I vary strategic actions. Therefore, to simplify our database for analysis, I will choose to minimize differences in my tree configurations.

Despite the desire for homogeneity, I can’t choose the same tree configuration for all formations. 2 distinct formation characteristics will greatly inform our strategic inputs — our position and how we arrived at the flop.

The 2x2 grid below demonstrates these two dimensions. The y-axis that separates the rows shows the differences between our two possible positions — in-position and out-of-position. The x-axis separating the columns shows the differences between the two ways we can arrive at the flop — the aggressor (offense) or the defender (defense).

The 2x2 grid above represents our strategic actions by position and how we arrived at the flop (i.e., which player had the initiative).

Our position affects the order in which we’ll act on the flop:

  • The out-of-position player always acts first, and the in-position player acts second.

  • For in-position solves, I want to explore my betting splits and frequencies against a lead. This means I will need to build 2 tables for in-position solves.

  • For out-of-position solves, I want to explore my betting splits, my opponent’s betting frequency if I check, and my response to a bet. This means I will need to build 3 tables for out-of-position solves.

Our initiative determines the strength of our range and how sophisticated our betting strategy can be:

  • For offensive formations, we are usually uncapped while our opponent is capped. This means that we’ll have a range advantage on most boards. As a result, I can build a more complex strategy and choose multiple bet sizes.

  • Our range will mostly be capped for defensive formations, putting us at a range disadvantage on most boards. Given our more condensed range, splitting ranges effectively will be more difficult. As a result, I will choose a single bet sizing.

  • Because of the difference in strategic actions, I will also need to build different tables for in-position and out-of-position.

Ultimately, I must create 10 tables — 2 in-position offense, 2 in-position defense, 3 out-of-position offense, and 3 out-of-position defense.

If I wanted to make this more complex, another potential dimension I could examine would be single-raised pots vs 3-bet pots. I thought delineating this in my setup, but ultimately chose not to, keeping the same framework for inputs as displayed above. While equilibrium solution outputs can (and will) vary significantly between the two pot types, the positional and range factors that lead to the configurations above generally apply.

Conclusion

Database planning and documentation can sometimes be tedious. As a result, it’s often an overlooked step. This might be fine for smaller projects. But it’s a necessary component for any significant efforts. If a calculation or spreadsheet error occurs, the ability to trace problems back to the core data set makes debugging much easier.

Balancing the desire to be comprehensive with scalability is tricky. While I want to be as specific as possible to the nuances of different spots, I also want to build an analysis that can scale without requiring me to treat every system as custom. Ultimately, I landed on the four strategic configurations detailed above based on position and our initiative going into the flop. This lets me customize my analysis while maintaining the ability to scale effectively. More importantly, it helps me develop better, actionable strategies that I can implement on the table.

In my next post, I plan to begin examining dimensionality. Now that I have these data tables defined, I need to find ways to analyze our key metrics. Segmentation is an important component of data analysis. While we can learn a lot from the aggregate, organizing data into like-groups can help to develop better insights that can be utilized to drive strategy. One of the most obvious ways to segment our flop study is by the various characteristics that define a flop, such as its suitedness, the card values, or its connectedness. I’ll examine existing categorization systems and define some of my own.

If you have any comments or thoughts, please feel free to leave any comments below. You can also contact me at [email protected] or on Twitter or YouTube through the links in the footer below. Thanks for reading.

-Lukich

Join the conversation

or to participate.