PDA

View Full Version : Minor League supplement to Lahman in the works.



Jeff Olsen
10-06-2009, 02:33 PM
With Stage One almost complete, I thought I'd give the public a little preview of what I've been working on these past few months.

Master file:

# ID IDM IDH DOBY DOBM DOBD DOBC DOBS DOBCI DDY DDM DDD DDC DDS DDCI First Last Note Given Nick Weight Height Bats Throws CSY CEY College 40ID 45ID RetroID HoltzID BBRefID Draft Sign D/S Team
1 agostju01 1958 2 23 P.R. Rio Piedras Juan Agosto Juan Roberto (Gonzalez) 190 74 L L 1975 1996 agostju01 agostju01 agosj001 agostju01 agostju01 1974 BOS
2 ahearpa01 1969 12 10 USA CA San Francisco Pat Ahearne Patrick Howard 195 75 R R 1992 2007 Pepperdine ahearpa01 ahearpa01 aheap001 ahearpa01 ahearpa01 1992 DETIt's nearly identical to Lahman's master but the start and end years reflect each player's entire pro career after being signed by a Major League team. The three additional columns specify when each guy was drafted, or signed if not drafted, and the team which signed him.

The reason I'm posting now is to drum up assistance for Stage Two: breaking down down the career years into single season lines. Like so:
ID YEAR SEQ AFF LG LVL POS POS 2
agostju01 1978 1 BOS AL A 1
agostju01 1980 1 CHW AL A 1
agostju01 1981 1 CHW AL AAA 1
agostju01 1981 2 CHW AL MLB 1
agostju01 1982 1 CHW AL AAA 1
agostju01 1982 2 CHW AL MLB 1
agostju01 1983 1 CHW AL AAA 1
agostju01 1983 2 CHW AL MLB 1
agostju01 1984 1 CHW AL MLB 1
agostju01 1985 1 CHW AL AAA 1
agostju01 1985 2 CHW AL MLB 1
agostju01 1986 1 CHW AL MLB 1
agostju01 1986 2 MIN AL AAA 1
agostju01 1986 3 MIN AL MLB 1
agostju01 1987 1 HOU NL AAA 1
agostju01 1987 2 HOU NL MLB 1
agostju01 1988 1 HOU NL MLB 1
agostju01 1989 1 HOU NL MLB 1
agostju01 1990 1 HOU NL MLB 1
agostju01 1991 1 STL NL MLB 1
agostju01 1992 1 STL NL MLB 1
agostju01 1992 2 SEA AL AAA 1
agostju01 1992 3 SEA AL MLB 1
agostju01 1993 1 SDP NL AAA 1
agostju01 1993 2 HOU NL AAA 1
agostju01 1993 3 HOU NL MLB 1
agostju01 1996 1 PIT NL AAA 1
ahearpa01 1992 1 DET AL A 1
ahearpa01 1993 1 DET AL A 1
ahearpa01 1994 1 DET AL AA 1
ahearpa01 1995 1 DET AL AAA 1
ahearpa01 1995 2 DET AL MLB 1
ahearpa01 1996 1 --- North. IND 1
ahearpa01 1996 2 NYM NL AAA 1
ahearpa01 1996 3 LAD NL A 1
ahearpa01 1996 4 LAD NL AA 1
ahearpa01 1997 1 LAD NL AA 1
ahearpa01 1997 2 LAD NL AAA 1
ahearpa01 1998 1 --- Atl. IND 1
ahearpa01 1999 1 --- Atl. IND 1
ahearpa01 1999 1 SEA AL AA 1
ahearpa01 2000 1 SEA AL AAA 1
ahearpa01 2001 1 FLA NL AAA 1
ahearpa01 2002 1 --- Atl. IND 1
ahearpa01 2002 2 DET AL AAA 1
ahearpa01 2003 1 DET AL AA 1
ahearpa01 2003 2 DET AL AAA 1
ahearpa01 2004 1 DET AL AAA 1
ahearpa01 2005 1 --- Atl. IND 1
ahearpa01 2006 1 --- Atl. IND 1
ahearpa01 2007 1 --- Atl. IND 1This will allow me to concentrate on Stage Three: a complete Master supplement for everyone who played in MLB since 1901, to eventually be followed by Stage Four.

Jeff Olsen
10-06-2009, 02:41 PM
Before anyone asks: yes, Clay already knows about this. In fact, I've sent him a version of the single season lines specifically for guys who converted from pitcher-to-batter and vice versa.

Jeff Olsen
10-06-2009, 03:07 PM
Stage One will be posted in this thread in a few days. It is being compiled primarily from The Baseball Cube, doublechecked against Baseball Reference due to TBC confusing players with similar names as well as missing or incorrectly attributing some draft/sign data and missing minor league data prior to 1978. I expect those who would like to assist with Stage Two to do the same and follow my established format.

filihok
10-06-2009, 04:03 PM
Is this as much work as I think it is going to be

Jeff Olsen
10-06-2009, 04:58 PM
Stage One contains about 5000 players. I'm hoping to divy up the work amongst a bunch of people; at least one person per AAA team would be nice.

Open Office, Microsoft Excel, or some other database program is highly recommended.

Gocardinals
10-06-2009, 06:09 PM
I could probably do the Memphis Redbirds.

Jeff Olsen
10-06-2009, 07:01 PM
Thanks! Memphis is one that I've completed for the Master. Anyone with Major League experience not listed here is on the Master list elsewhere.

I did Ankiel to get you started. Feel free to fill in '06 if you know his situation.
ID YEAR SEQ AFF LG LVL NOTE POS POS 2 POS 3 POS 4 POS 5 POS 6
ankieri01 1998 1 STL NL A 1
ankieri01 1999 1 STL NL AA 1
ankieri01 1999 2 STL NL AAA 1
ankieri01 1999 3 STL NL MLB 1
ankieri01 2000 1 STL NL MLB 1
ankieri01 2001 1 STL NL R 1
ankieri01 2001 2 STL NL AAA 1
ankieri01 2001 3 STL NL MLB 1
ankieri01 2002 1 STL NL DL
ankieri01 2003 1 STL NL AA 1
ankieri01 2004 1 STL NL A 1
ankieri01 2004 2 STL NL AA 1
ankieri01 2004 3 STL NL AAA 1
ankieri01 2004 4 STL NL MLB 1
ankieri01 2005 1 STL NL A 10
ankieri01 2005 2 STL NL AA 10
ankieri01 2006 1 STL NL ?
ankieri01 2007 1 STL NL AAA 10
ankieri01 2007 1 STL NL MLB 9 8 7
ankieri01 2008 1 STL NL MLB 8 7 9
ankieri01 2009 1 STL NL MLB 8 9 7
10 denotes OF.

gosensgo101
10-06-2009, 08:03 PM
I'd definately like to help out. I'll take Las Vegas if its simply this year, or Syracuse if we're going back through the years.

I'm curious, will this include both major leaguers and guys who never made it out of the minors?

I'm excited to see where this can go.

Jeff Olsen
10-06-2009, 08:11 PM
I'd definately like to help out. I'll take Las Vegas if its simply this year, or Syracuse if we're going back through the years.Syracuse it is. Thanks!
I'm curious, will this include both major leaguers and guys who never made it out of the minors?Only players listed in Lahman so no career minor leaguers included.

Jeff Olsen
10-06-2009, 08:49 PM
Buffalo Bisons Cleveland Indians
Charlotte Knights Chicago White Sox
Columbus Clippers Washington Nationals
Durham Bulls Tampa Bay Devil Rays
Indianapolis Indians Pittsburgh Pirates
Lehigh Valley IronPigs Philadelphia Phillies
Louisville Bats Cincinnati Reds
Norfolk Tides Baltimore Orioles
Pawtucket Red Sox Boston Red Sox
Richmond Braves Atlanta Braves
Rochester Red Wings Minnesota Twins
Scranton/Wilkes-Barre Yankees New York Yankees
Syracuse Chiefs Toronto Blue Jays
Toledo Mud Hens Detroit Tigers

Albuquerque Isotopes Florida Marlins
Colorado Springs Sky Sox Colorado Rockies
Fresno Grizzlies San Francisco Giants
Iowa Cubs Chicago Cubs
Las Vegas 51s Los Angeles Dodgers
Memphis Redbirds St. Louis Cardinals
Nashville Sounds Milwaukee Brewers
New Orleans Zephyrs New York Mets
Oklahoma RedHawks Texas Rangers
Omaha Royals Kansas City Royals
Portland Beavers San Diego Padres
Round Rock Express Houston Astros
Sacramento River Cats Oakland Athletics
Salt Lake Bees Los Angeles Angels
Tacoma Rainiers Seattle Mariners
Tucson Sidewinders Arizona Diamondbacks
I started this project with Toledo, due to them having been around for so long, so that's the largest group of alumni. Groups tend to decrease down the list, with the Yankees currently being the least represented due to S/W-B having such a short history.

I'm on Omaha right now.

Gocardinals
10-06-2009, 09:01 PM
Question about Brian Barden: How should I classify short season A and advanced A? Just as Single-A, or leave them alone?

BTW, Rick Ankiel spent the entire 2006 season on the DL.

Jeff Olsen
10-06-2009, 09:10 PM
Question about Brian Barden: How should I classify short season A and advanced A? Just as Single-A, or leave them alone?Just Single-A since the game makes no distinction.
Rick Ankiel spent the entire 2006 season on the DL.Hm. I wonder why BBRef lists him as injured for 2002 doesn't say a thing for 2006.

Gocardinals
10-07-2009, 07:54 AM
Should I list DH as 11, since outfield is 10?

Jeff Olsen
10-07-2009, 09:55 AM
Should I list DH as 11, since outfield is 10?DH should have a 0 for their position, same for guys who were only used as PH/PR. Forgot to mention that.

filihok
10-08-2009, 02:39 PM
How much time, roughly, does it take to do a team?

I MAY take on the 51's. I will probably need some guidance as well as I don't know too much about the CVS files and how exactly they work. Which is why I might volunteer to do it.

Jeff Olsen
10-08-2009, 03:17 PM
How much time, roughly, does it take to do a team?Depends on how far down the list it is. Teams toward the bottom tend to have smaller groups of alumni because players who are also alumni for other teams are grouped with the team that is the highest on the list. Except for Toldeo, as stated above.

The one you're interested in is right above Memphis so Gocardinals should be able to give you an idea of how long it would take.
I MAY take on the 51's. I will probably need some guidance as well as I don't know too much about the CVS files and how exactly they work. Which is why I might volunteer to do it.First thing you need is a spreadsheet program. Open Office (http://www.openoffice.org/) Calc is what I use and it's free. It is possible to edit CSV files in a text editor but not at all recommended.

Then it's a matter of copying each players' ID from your team's segment of the Master file as needed into a copy of the Career file and building the season lines according to each players' spot in the parent teams' system and the position(s) played for each season. Primary position goes in POS and go down the line as needed.

If you're still interested, I can post the 51s' segment tonight.

Gocardinals
10-08-2009, 03:44 PM
I won't be able to tell you how much time it takes, because some players have longer careers than others. Pitchers also take less time because they very rarely play positions other than pitcher, so you can just type in "1" in the position slot (still check, just to be safe).

Jeff Olsen
10-12-2009, 11:35 PM
Stage 1 is complete and has been added to the OP. Alumni groups are marked by the team's city so they're easy to find. Anyone wishing to help should download that file and look for the team/group size ratio that appeals to them.

Jeff Olsen
10-13-2009, 09:40 PM
If anyone spots an error in the Master, please let me know. I just found that I put Dave Stieb's career starting three decades earlier than it should! A corrected version has been sent to Clay.

GOYANKSGONJ
10-14-2009, 01:19 AM
If filihok doesn't end up taking on the 51s, I will.

If he does, I'll take another PCL team that's not already accounted for.

filihok
10-14-2009, 01:55 PM
I've (just) started the 51's.

Where are people getting this data from?
I'm using BBR but I don't know an easy way to get the sequence data. Also, a player that goes back and forth between levels, how are we coding that?



alomaro01 1985 1 CHR NL A 4 6
alomaro01 1986 1 RNO NL A 4
alomaro01 1987 1 WCH NL AA 6
alomaro01 1988 1 LSV NL AAA 4
alomaro01 1988 2 SDP NL MLB 4
alomaro01 1989 1 SDP NL MLB 4
alomaro01 1990 1 SDP NL MLB 4 6
alomaro01 1991 1 TOR AL MLB 4
alomaro01 1992 1 TOR AL MLB 4
alomaro01 1993 1 TOR AL MLB 4
alomaro01 1994 1 TOR AL MLB 4
alomaro01 1995 1 TOR AL MLB 4
alomaro01 1996 1 BAL AL MLB 4
alomaro01 1997 1 BAL AL MLB 4
alomaro01 1998 1 BAL AL MLB 4
alomaro01 1999 1 CLE AL MLB 4
alomaro01 2000 1 CLE AL MLB 4
alomaro01 2001 1 CLE AL MLB 4
alomaro01 2002 1 NYM NL MLB 4
alomaro01 2003 1 NYM NL MLB 4
alomaro01 2003 2 CHW AL MLB 4
alomaro01 2004 ??? TUC NL AAA 4
alomaro01 2004 1 ARZ NL MLB 4
alomaro01 2004 2 CHW AL MLB 4

this look ok? Besides the ???'s of course

Jeff Olsen
10-14-2009, 04:17 PM
I've (just) started the 51's.Thanks!
Where are people getting this data from?
I'm using BBR but I don't know an easy way to get the sequence data.Yeah, BBRef can be confusing, particularly when players moved around a lot. It's eaiser to start with TBC (http://www.thebaseballcube.com/teams/alumni/10274.shtml) and doublecheck against BBRef in case TBC has something wrong (http://www.thebaseballcube.com/players/D/Bill-Doran.shtml). TBC is often missing Independent league data from the 1990s and minor league data from before 1978.
Also, a player that goes back and forth between levels, how are we coding that?For Roberto's 2004 season, make AAA sequence 1 and MLB sequence 2.
this look ok? Besides the ???'s of courseYou should be using the parent clubs' abbreviations instead of the farm teams' but otherwise that's fine. By the way, Arizona should be "ARI"; don't want to confuse the game when this gets implemented. :)

filihok
10-14-2009, 04:29 PM
Thanks!Yeah, BBRef can be confusing, particularly when players moved around a lot. It's eaiser to start with TBC (http://www.thebaseballcube.com/teams/alumni/10274.shtml) and doublecheck against BBRef in case TBC has something wrong (http://www.thebaseballcube.com/players/D/Bill-Doran.shtml).

Yeah...that'll be much easier


For Roberto's 2004 season, make AAA sequence 1 and MLB sequence 2.

Shouldn't it be AAA=1, AZ=2, and CHW=3 or am I not understanding something? Actually, it shouldn't even be that. I am confused

If a player starts in MLB, goes to AAA is that AAA=2, MLB=1?
How about a player that goes back and forth?


You should be using the parent clubs' abbreviations instead of the farm teams' but otherwise that's fine. By the way, Arizona should be "ARI"; don't want to confuse the game when this gets implemented. :)

Good, that'll be easier

Jeff Olsen
10-14-2009, 04:41 PM
Shouldn't it be AAA=1, AZ=2, and CHW=3 or am I not understanding something? Actually, it shouldn't even be that.Ah! Didn't notice CHW in there. Okay, given that he ended the previous season with CHW: CHW=1, ARI AAA=2, ARI MLB=3.
If a player starts in MLB, goes to AAA is that AAA=2, MLB=1?Yes, if they switch parent clubs mid-season like Roberto in 2004.
How about a player that goes back and forth?For the same parent? Just lump similar levels together.

Jeff Olsen
10-14-2009, 07:16 PM
Don't forget to use the Note column for when a guy missed an entire season. Like when Agosto was a free agent:
agostju01 1979 1 --- --- --- FA
I didn't include Notes when I initially laid out the columns so it's not in the example at the top of the thread.

Jeff Olsen
10-14-2009, 08:04 PM
Oh, and BBRef's Transaction data is very useful for determining the sequence for teams. Like Matt Anderson (http://www.thebaseballcube.com/players/A/Matt-Anderson.shtml) in 2006; he started the season on the Giants, they released him in July, then Bridgeport picked him up for awhile, and he spent all of 2007 as a free agent.

Yeah, I started working on Toledo last night. :)

Jeff Olsen
01-14-2010, 10:12 PM
Replaced the Master.csv with a corrected one.

gosensgo101
01-14-2010, 10:23 PM
Hmm, I never got around to this. Still need me to do what I claimed I was going to do?

Jeff Olsen
01-18-2010, 04:50 PM
Any help would be greatly appreciated.

Gocardinals
01-18-2010, 09:17 PM
Sorry I haven't been working on the Memphis Redbirds lately (not that you knew); the computer it was on got moved to my brother's room so I can't work on it as much.

Jeff Olsen
01-19-2010, 09:25 AM
No problem. Building the year-by-year database is pretty tedious, particularly when working on non-pitchers. One needs to balance one's time with other stuff. I'm still in Toledo's C-names.

Jeff Olsen
02-22-2010, 02:54 PM
Hi, guys. I think it would be most beneficial to just include ID, YEAR, SEQ, AFF, LG, LVL, and NOTE for now. I want Clay to be able to use those ASAP so Position data can wait.

Thanks again for volunteering to help!

Jeff Olsen
05-03-2010, 09:35 AM
The Master file is basically complete and is now in Clay's hands. As I compile the year-by-year data, I will continue to fill holes in players' vitals and fix data that Lahman has incorrect.