Jump to content

Table conversion problem


Slawbug

Recommended Posts

I have a table. Each row of the table has the correct contents for its row, but the columns do not all have the correct contents for their column. However, each cell's contents begin with a label for the correct column.

 

For example, the first column is "CREATE" and every single row has a cell that reads "CREATE: xx" or "CREATE: whatever"... as it happens, those are all in the first column. But for the other columns, the component cells are scattered: thus if the third column is "ID" various rows might have a cell that reads "ID: 23" in the second column or "ID: 35" in the fifth column.

 

The table is very large (about 1800 rows) so sorting by hand is not a good option. Obviously this task CAN be accomplished much faster and without trouble by a computer. But I can't find an easy way to do this. Short of writing my own code to do it, which in my case would be no better than sorting by hand, does anyone know of a nice shortcut that will take care of this for me?

Link to comment
Share on other sites

Clarification: I should add that not all of the cells/labels exist in every row. For example, there might be a row that just doesn't have an "ID: zz" cell -- that row should just have an empty cell in the "ID" column. So a simple sort of cells within the row won't work, either.

Link to comment
Share on other sites

So the problem is you have something that looks like:

 

Code:
| CREATE    |   FOO    |    ID       |   BAR      || CREATE: A |   ID: 7  |             |   FOO: y   || CREATE: B |   BAR: ? |    FOO: g   |  ID: 12    |

 

and you want to turn it into

 

Code:
| CREATE    |   FOO    |     ID       |   BAR   || CREATE: A |   FOO: y |    ID: 7     |         || CREATE: B |   FOO: g |    ID: 12    |  BAR: ? |

 

The task is involved enough that I can't imagine any way to do it without at least a little bit of programming to describe it, although I wouldn't call that nearly as bad as doing the job by hand, given you have more than a thousand rows and several columns. How is the data currently stored? Plain text? Delimited how? It seems like all that's required is:

 

Code:
Read first rowCreate an empty map/dictionaryFor each item in first row	Add the (key,value) pair (item,index) to the mapWrite the first row back outFor each remaining row	Create an array with length equal to the number of columns	For each item on the row		Look up the item's prefix in the map		Assign the item to the indicated index in the array	Write out the array as the new row

 

I just couldn't resist, so in between other things this morning, I threw together something which can turn my first example table into the second. I suppose I could have written it in Python or something, so that a compiler isn't needed, but that would have taken a lot longer.

Click to reveal..
Code:
#include <iostream>#include <iomanip>#include <algorithm>#include <map>#include <string>#include <vector>#include <sstream>std::string trim(std::string s,const std::string& whitespace=" \t\n\r"){	s=s.erase(s.find_last_not_of(whitespace)+1);	return(s.erase(0,s.find_first_not_of(whitespace)));}struct center{	const std::string& str;	unsigned int width;	center(const std::string& s, unsigned int w):	str(s),width(w){}};std::ostream& operator <<(std::ostream& os, const center& c){	unsigned int padBefore=std::max(((int)c.width-(int)c.str.size())/2,0);	unsigned int padAfter=std::max((int)c.width-(int)padBefore-(int)c.str.size(),0);	os << std::string(padBefore,' ') << c.str << std::string(padAfter,' ');	return(os);}#ifdef __CLING__void fix_table(){#elseint main(){#endif	const char delim='|';	const char prefixMark=':';		std::string line, item;	std::map<std::string,unsigned int> labels;	std::vector<unsigned int> widths;	//examine the header line	getline(std::cin,line);	std::istringstream ss(line);	while(getline(ss,item,delim)){		unsigned int width = item.size();		if((item = trim(item)).empty())			continue;		labels.insert(std::make_pair(item,labels.size()));		widths.push_back(width);		std::cout << delim << center(item,widths.back());	}	std::cout << delim << std::endl;		//sort each record line	while(getline(std::cin,line)){		ss.clear();		ss.str(line);		std::vector<std::string> newline(labels.size());		//read each entry and put it into place in the array		while(getline(ss,item,delim)){			if((item=trim(item)).empty())				continue;			size_t prefixEnd=item.find(prefixMark);			if(prefixEnd==std::string::npos)				continue; //drop items without prefixes			std::map<std::string,unsigned int>::const_iterator it=labels.find(item.substr(0,prefixEnd));			if(it==labels.end())				continue; //drop items with unknown prefixes			newline[it->second]=item;		}		//print the sorted array		for(unsigned int i=0; i<newline.size(); i++)			std::cout << delim << center(newline[i],widths[i]);		std::cout << delim << std::endl;	}		#ifndef __CLING__	return(0);	#endif}
Link to comment
Share on other sites

Whoa. Thanks, Niemand! A few questions, if you're still interested:

 

1) For when I attempt to compile this: what language is it? I recognize some bits as C, but there are other bits that don't look like C, or at least not the incarnations of C I was once familiar with.

 

2) Where does it read in from? In other words, do I feed it a text file somehow, am I expected to paste the table into a console window, or what?

 

3) Is it usable with any standard delimitation (e.g, tab-delimited (current format), CSV, or whatever) or do I need to convert to spaces and pipes? Correspondingly, I see the use of whitespace in the code; does that mean spaces within cells with text string contents (because there are some) will break it?

Link to comment
Share on other sites

In case Niemand doesn't read this for a while, here's the answers to parts of your questions. I've had very little exposure to C++ in particular, so don't take my word for things:

 

1) See above. Pretty sure it's just normal C++. I've always used g++ to compile C++ code (it's the C++ equivalent of gcc). You might be able to find it preinstalled on a Mac; I dunno.

 

2) Looks like Niemand is using standard input and output. That means you would run the program from the shell as follows:

Code:
$ path-to-program < path-to-plaintext-input-file > path-to-desired-output-file

 

3) You should be able to change the values of the delim and prefix constants to whatever you want ('\t' is the symbol for tab). I'd have to give the program a bit more of a read to see how Niemand is dealing with whitespace.

Link to comment
Share on other sites

Thanks to you too, Dintiradan!

 

I somehow missed the delim and prefix declarations entirely. Whoops. Yeah, that would answer that question!

 

g++ was EXACTLY the help I needed -- I was using gcc before and couldn't get it to compile in terminal. (Xcode would compile it, but with no window or terminal, so that was useless.)

 

Now I get the following errors (an improvement at least). Any ideas?

 

EDIT: Okay, that was silly of me. Forget the previous errors. g++ seems to be compiling it successfully but I can't do anything with the resulting output file, which I would expect to be an executable, right?

 

Code:
Macintosh-3:~ slartucker$ g++ niemand.cpp -o testniemandMacintosh-3:~ slartucker$ Macintosh-3:~ slartucker$ testniemand-bash: testniemand: command not foundMacintosh-3:~ slartucker$ testniemand testin testout-bash: testniemand: command not found
Link to comment
Share on other sites

Okay, apparently the OS X terminal is a bit weird with file paths. Now I can run the program, but I get this:

 

Code:
Macintosh-3:~ slartucker$ /Users/slartucker/testniemand||^CMacintosh-3:~ slartucker$ /Users/slartucker/testniemand testin testout||||||^C
I assume that the pipes are related to the program and that something about the I/O isn't working quite right, any thoughts?

 

EDIT: Whoops -- success! That was the program running. It just reads in from the command prompt and not from files. Okay, let's see if pasting 1800 lines is going to overflow the terminal buffer....

 

EDIT #2: Whoops, Dintiradan's < and > to the rescue. You guys rock. This makes me life so much better. Thank you both so much!

Link to comment
Share on other sites

If you're in the same directory as your program, you can run it with:

Code:
./testniemand

 

Also, you need to use the angle brackets (greater-than and less-than signs) redirect standard input and output to files. Otherwise, standard input will read from your keyboard and standard output will print to the shell/command prompt/terminal/whatever you want to call it.

 

(If you don't have any angle brackets, 'testin' and 'testout' are used as command line arguments.)

 

EDIT: Sniped by Slarty's edit. So, are these stats Spidweb-related?

Link to comment
Share on other sites

Glad it worked, Slarty. Unfortunately, the point when you picked up and started trying to make it work was five minutes after I went off to a meeting, but luckily Dinti was on his toes. I wish I hadn't been so stupid as to edit out the sentence originally in my post that said it was C++, since that would have removed one source of confusion. Also, some sort of usage instructions might have been a good idea. rolleyes

 

Anyway, I was really expecting that this would need more tweaking than just setting those character constants, so I'm happy that it actually got the job done.

Link to comment
Share on other sites

Okay, new problem. It works FANTASTIC on small test files. My actual table in question is 55 columns by 1812 rows. When I use that file (including several attempted reformats to make sure there were no weird characters or anything) the program only works partially. It will copy the entire file, as far as I can tell, to the output file, and it will get rid of an empty cells (i.e., two tabs in a row -- I did change the constant to tab). But it won't change the order of cells otherwise. Any thoughts on how I can troubleshoot this?

Link to comment
Share on other sites

Huh. That's pretty weird. Possibly it's having issues with properly reading the header line? Actually I don't think that even that would do it.

 

Does the problem kick in after a certain row or number of rows? If it still does the wrong thing with, say, the first to rows of data (headers and one actual entry), it would probably be easiest for me to debug it myself, if you didn't mind sending me that subset of the data (which could be obfuscated in any way you see fit as long as the obfuscated data still fails to be properly reordered). (My email's niemandcw@gmail.com if you want to send anything that way.)

 

This particular failure mode is really bizarre, since in order for a cell to be printed back out the program has to have read the cell, decided that it knew which column the cell belonged in, and put it there.

Link to comment
Share on other sites

Got it to work again! The culprit was MacOS text formatting (about line breaks, I'm guessing) that TextEdit seems to forcibly put on files it saves, even if they did not begin with MacOS formatting. Using TextWrangler instead saved the day.

 

(Pasting 800k of characters into pico did *not* work. :p)

Link to comment
Share on other sites

Addendum: pasting the data into pico also worked, it just took about 15 minutes for terminal to handle the paste. The More You Know... :-D

 

Thanks again to both of you so much -- this has saved me a huge amount of work.

 

Also, this program will be very useful the next time Jeff releases a game that I care about that has a legible definitions file! mwahaha...

Link to comment
Share on other sites

Originally Posted By: CRISIS on INFINITE SLARTIES
g++ was EXACTLY the help I needed -- I was using gcc before and couldn't get it to compile in terminal. (Xcode would compile it, but with no window or terminal, so that was useless.)
g++ and gcc are actually the same compiler, but when run as g++ it automatically links with the standard C++ libraries, which it doesn't do when run as gcc. So, if you were to run as gcc, you'd have to pass an argument to tell it to link with the standard C++ libraries.
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...