Hatchling Cockatrice Quiconque Posted August 26, 2010 Share Posted August 26, 2010 I have a table. Each row of the table has the correct contents for its row, but the columns do not all have the correct contents for their column. However, each cell's contents begin with a label for the correct column. For example, the first column is "CREATE" and every single row has a cell that reads "CREATE: xx" or "CREATE: whatever"... as it happens, those are all in the first column. But for the other columns, the component cells are scattered: thus if the third column is "ID" various rows might have a cell that reads "ID: 23" in the second column or "ID: 35" in the fifth column. The table is very large (about 1800 rows) so sorting by hand is not a good option. Obviously this task CAN be accomplished much faster and without trouble by a computer. But I can't find an easy way to do this. Short of writing my own code to do it, which in my case would be no better than sorting by hand, does anyone know of a nice shortcut that will take care of this for me? Quote Link to comment Share on other sites More sharing options...
Hatchling Cockatrice Quiconque Posted August 26, 2010 Author Share Posted August 26, 2010 Clarification: I should add that not all of the cells/labels exist in every row. For example, there might be a row that just doesn't have an "ID: zz" cell -- that row should just have an empty cell in the "ID" column. So a simple sort of cells within the row won't work, either. Quote Link to comment Share on other sites More sharing options...
Well-Actually War Trall Niemand Posted August 26, 2010 Share Posted August 26, 2010 So the problem is you have something that looks like: Code: | CREATE | FOO | ID | BAR || CREATE: A | ID: 7 | | FOO: y || CREATE: B | BAR: ? | FOO: g | ID: 12 | and you want to turn it into Code: | CREATE | FOO | ID | BAR || CREATE: A | FOO: y | ID: 7 | || CREATE: B | FOO: g | ID: 12 | BAR: ? | The task is involved enough that I can't imagine any way to do it without at least a little bit of programming to describe it, although I wouldn't call that nearly as bad as doing the job by hand, given you have more than a thousand rows and several columns. How is the data currently stored? Plain text? Delimited how? It seems like all that's required is: Code: Read first rowCreate an empty map/dictionaryFor each item in first row Add the (key,value) pair (item,index) to the mapWrite the first row back outFor each remaining row Create an array with length equal to the number of columns For each item on the row Look up the item's prefix in the map Assign the item to the indicated index in the array Write out the array as the new row I just couldn't resist, so in between other things this morning, I threw together something which can turn my first example table into the second. I suppose I could have written it in Python or something, so that a compiler isn't needed, but that would have taken a lot longer. Click to reveal.. Code: #include <iostream>#include <iomanip>#include <algorithm>#include <map>#include <string>#include <vector>#include <sstream>std::string trim(std::string s,const std::string& whitespace=" \t\n\r"){ s=s.erase(s.find_last_not_of(whitespace)+1); return(s.erase(0,s.find_first_not_of(whitespace)));}struct center{ const std::string& str; unsigned int width; center(const std::string& s, unsigned int w): str(s),width(w){}};std::ostream& operator <<(std::ostream& os, const center& c){ unsigned int padBefore=std::max(((int)c.width-(int)c.str.size())/2,0); unsigned int padAfter=std::max((int)c.width-(int)padBefore-(int)c.str.size(),0); os << std::string(padBefore,' ') << c.str << std::string(padAfter,' '); return(os);}#ifdef __CLING__void fix_table(){#elseint main(){#endif const char delim='|'; const char prefixMark=':'; std::string line, item; std::map<std::string,unsigned int> labels; std::vector<unsigned int> widths; //examine the header line getline(std::cin,line); std::istringstream ss(line); while(getline(ss,item,delim)){ unsigned int width = item.size(); if((item = trim(item)).empty()) continue; labels.insert(std::make_pair(item,labels.size())); widths.push_back(width); std::cout << delim << center(item,widths.back()); } std::cout << delim << std::endl; //sort each record line while(getline(std::cin,line)){ ss.clear(); ss.str(line); std::vector<std::string> newline(labels.size()); //read each entry and put it into place in the array while(getline(ss,item,delim)){ if((item=trim(item)).empty()) continue; size_t prefixEnd=item.find(prefixMark); if(prefixEnd==std::string::npos) continue; //drop items without prefixes std::map<std::string,unsigned int>::const_iterator it=labels.find(item.substr(0,prefixEnd)); if(it==labels.end()) continue; //drop items with unknown prefixes newline[it->second]=item; } //print the sorted array for(unsigned int i=0; i<newline.size(); i++) std::cout << delim << center(newline[i],widths[i]); std::cout << delim << std::endl; } #ifndef __CLING__ return(0); #endif} Quote Link to comment Share on other sites More sharing options...
Hatchling Cockatrice Quiconque Posted August 26, 2010 Author Share Posted August 26, 2010 Whoa. Thanks, Niemand! A few questions, if you're still interested: 1) For when I attempt to compile this: what language is it? I recognize some bits as C, but there are other bits that don't look like C, or at least not the incarnations of C I was once familiar with. 2) Where does it read in from? In other words, do I feed it a text file somehow, am I expected to paste the table into a console window, or what? 3) Is it usable with any standard delimitation (e.g, tab-delimited (current format), CSV, or whatever) or do I need to convert to spaces and pipes? Correspondingly, I see the use of whitespace in the code; does that mean spaces within cells with text string contents (because there are some) will break it? Quote Link to comment Share on other sites More sharing options...
Easygoing Eyebeast Dintiradan Posted August 26, 2010 Share Posted August 26, 2010 In case Niemand doesn't read this for a while, here's the answers to parts of your questions. I've had very little exposure to C++ in particular, so don't take my word for things: 1) See above. Pretty sure it's just normal C++. I've always used g++ to compile C++ code (it's the C++ equivalent of gcc). You might be able to find it preinstalled on a Mac; I dunno. 2) Looks like Niemand is using standard input and output. That means you would run the program from the shell as follows: Code: $ path-to-program < path-to-plaintext-input-file > path-to-desired-output-file 3) You should be able to change the values of the delim and prefix constants to whatever you want ('\t' is the symbol for tab). I'd have to give the program a bit more of a read to see how Niemand is dealing with whitespace. Quote Link to comment Share on other sites More sharing options...
Hatchling Cockatrice Quiconque Posted August 26, 2010 Author Share Posted August 26, 2010 Thanks to you too, Dintiradan! I somehow missed the delim and prefix declarations entirely. Whoops. Yeah, that would answer that question! g++ was EXACTLY the help I needed -- I was using gcc before and couldn't get it to compile in terminal. (Xcode would compile it, but with no window or terminal, so that was useless.) Now I get the following errors (an improvement at least). Any ideas? EDIT: Okay, that was silly of me. Forget the previous errors. g++ seems to be compiling it successfully but I can't do anything with the resulting output file, which I would expect to be an executable, right? Code: Macintosh-3:~ slartucker$ g++ niemand.cpp -o testniemandMacintosh-3:~ slartucker$ Macintosh-3:~ slartucker$ testniemand-bash: testniemand: command not foundMacintosh-3:~ slartucker$ testniemand testin testout-bash: testniemand: command not found Quote Link to comment Share on other sites More sharing options...
Hatchling Cockatrice Quiconque Posted August 26, 2010 Author Share Posted August 26, 2010 Okay, apparently the OS X terminal is a bit weird with file paths. Now I can run the program, but I get this: Code: Macintosh-3:~ slartucker$ /Users/slartucker/testniemand||^CMacintosh-3:~ slartucker$ /Users/slartucker/testniemand testin testout||||||^C I assume that the pipes are related to the program and that something about the I/O isn't working quite right, any thoughts? EDIT: Whoops -- success! That was the program running. It just reads in from the command prompt and not from files. Okay, let's see if pasting 1800 lines is going to overflow the terminal buffer.... EDIT #2: Whoops, Dintiradan's < and > to the rescue. You guys rock. This makes me life so much better. Thank you both so much! Quote Link to comment Share on other sites More sharing options...
Easygoing Eyebeast Dintiradan Posted August 26, 2010 Share Posted August 26, 2010 If you're in the same directory as your program, you can run it with: Code: ./testniemand Also, you need to use the angle brackets (greater-than and less-than signs) redirect standard input and output to files. Otherwise, standard input will read from your keyboard and standard output will print to the shell/command prompt/terminal/whatever you want to call it. (If you don't have any angle brackets, 'testin' and 'testout' are used as command line arguments.) EDIT: Sniped by Slarty's edit. So, are these stats Spidweb-related? Quote Link to comment Share on other sites More sharing options...
Well-Actually War Trall Niemand Posted August 26, 2010 Share Posted August 26, 2010 Glad it worked, Slarty. Unfortunately, the point when you picked up and started trying to make it work was five minutes after I went off to a meeting, but luckily Dinti was on his toes. I wish I hadn't been so stupid as to edit out the sentence originally in my post that said it was C++, since that would have removed one source of confusion. Also, some sort of usage instructions might have been a good idea. Anyway, I was really expecting that this would need more tweaking than just setting those character constants, so I'm happy that it actually got the job done. Quote Link to comment Share on other sites More sharing options...
Hatchling Cockatrice Quiconque Posted August 26, 2010 Author Share Posted August 26, 2010 Okay, new problem. It works FANTASTIC on small test files. My actual table in question is 55 columns by 1812 rows. When I use that file (including several attempted reformats to make sure there were no weird characters or anything) the program only works partially. It will copy the entire file, as far as I can tell, to the output file, and it will get rid of an empty cells (i.e., two tabs in a row -- I did change the constant to tab). But it won't change the order of cells otherwise. Any thoughts on how I can troubleshoot this? Quote Link to comment Share on other sites More sharing options...
Well-Actually War Trall Niemand Posted August 26, 2010 Share Posted August 26, 2010 Huh. That's pretty weird. Possibly it's having issues with properly reading the header line? Actually I don't think that even that would do it. Does the problem kick in after a certain row or number of rows? If it still does the wrong thing with, say, the first to rows of data (headers and one actual entry), it would probably be easiest for me to debug it myself, if you didn't mind sending me that subset of the data (which could be obfuscated in any way you see fit as long as the obfuscated data still fails to be properly reordered). (My email's niemandcw@gmail.com if you want to send anything that way.) This particular failure mode is really bizarre, since in order for a cell to be printed back out the program has to have read the cell, decided that it knew which column the cell belonged in, and put it there. Quote Link to comment Share on other sites More sharing options...
Hatchling Cockatrice Quiconque Posted August 26, 2010 Author Share Posted August 26, 2010 Got it to work again! The culprit was MacOS text formatting (about line breaks, I'm guessing) that TextEdit seems to forcibly put on files it saves, even if they did not begin with MacOS formatting. Using TextWrangler instead saved the day. (Pasting 800k of characters into pico did *not* work. ) Quote Link to comment Share on other sites More sharing options...
Well-Actually War Trall Niemand Posted August 26, 2010 Share Posted August 26, 2010 Okay then. Strange that that would have been it; my test file was saved from TextEdit, and while the program shouldn't have been particularly sensitive to line endings, my implicit assumption was that they would be in the Unix/Modern Mac OS style. Quote Link to comment Share on other sites More sharing options...
Hatchling Cockatrice Quiconque Posted August 26, 2010 Author Share Posted August 26, 2010 Addendum: pasting the data into pico also worked, it just took about 15 minutes for terminal to handle the paste. The More You Know... :-D Thanks again to both of you so much -- this has saved me a huge amount of work. Also, this program will be very useful the next time Jeff releases a game that I care about that has a legible definitions file! mwahaha... Quote Link to comment Share on other sites More sharing options...
Understated Ur-Drakon Celtic Minstrel Posted August 27, 2010 Share Posted August 27, 2010 Originally Posted By: CRISIS on INFINITE SLARTIES g++ was EXACTLY the help I needed -- I was using gcc before and couldn't get it to compile in terminal. (Xcode would compile it, but with no window or terminal, so that was useless.) g++ and gcc are actually the same compiler, but when run as g++ it automatically links with the standard C++ libraries, which it doesn't do when run as gcc. So, if you were to run as gcc, you'd have to pass an argument to tell it to link with the standard C++ libraries. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.